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DISCUSSION 
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University of Alberta 

1. Introduction. Hallin, Paindaveine and Siman — hereafter HPS — are to 
be congratulated for the appreciation their paper is receiving from The An- 
nals of Statistics. We personally are indebted for the attention they devoted 
to our paper, [11] — hereafter KM. The topic is quite delicate and multi- 
faceted, hence all the misunderstandings we allege below show deficiencies 
of our exposition rather than anything else. 

2. This is not a quantile. The altered title of Magritte's painting entitling 
this section can be seen as a gnomic expression of some subtle, nevertheless 
fundamental differences between HPS and KM, a distinct "philosophy" in 
the language of HPS. While HPS state in their first sentence their intent to 
"propose a definition of multivariate quantiles," KM in their second sentence 
maintain 

While such an objective could be mistaken for yet another attempt in the 
ongoing quest for "multivariate quantiles" ... we would like to stress that we 
differ in the position that no multivariate generalization of the quantile concept 
may be needed at all — . . . 

Instead, our intention was to discuss "certain aspects of using quantiles to 
obtain insights about multivariate data," suggesting that interesting insights 
about multivariate data can be obtained by looking at univariate quantiles 
of projections. Nothing beyond concepts already well established in data- 
analytic practice: see [12] regarding univariate quantiles (the concept we 
feel primarily rooted in the order of the numerical scale rather than in their 
secondary L 1 characterization), and any textbook on multivariate analysis 
regarding projections. 

We admit that KM might have not accentuated the word "univariate" 
enough — nevertheless, nothing like "projection quantiles" is mentioned, nor 



Received August 2009. 

1 Supported by the Natural Sciences and Engineering Research Council of Canada. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2010, Vol. 38, No. 2, 685-693. This reprint differs from the original in pagination 

and typographic detail. 



1 



2 



L. KONG AND I. MIZERA 



anything like r-quantile or qKM;i/u is introduced there. Put verbally, to es- 
cape sometimes serpentine notation: KM do not consider lines orthogonal to 
the quantile directions " multivariate quantiles," but merely consider them a 
graphical tool to visualize univariate quantiles of the corresponding projec- 
tions. If HPS write 

"The resulting quantile contours (the collections, for fixed r, of qKM;i/u's do 
not enjoy the properties (independence with respect to to the choice of an 
origin, affine-equivariance, nestedness, etc.) one is expecting from a quantile 
concept." 

We would like to repeat that "quantile biplots" (which HPS use as a straw 
man in their Section 3.2 and elsewhere) were brought up by KM not as a 
recommendation for practical use, but as something that shows the pitfalls 
of attempts to summarize quantiles of projections graphically, something 
that eventually vindicates the use of depth contours — which is what HPS 
arrive to anyway, if it comes to quantile contours. If HPS surmise 

Contrary to Kong and Mizera's, however, the 7r TU quantile hyperplanes do 
enjoy all the desirable properties of a well-behaved quantile concept. 

We ask: contrary to Kong and Mizera's what? Contrary to "quantile contours"— 
which, being collections of objects are compared here to "quantile hyper- 
planes," that is, single objects? Or contrary to something else KM threw 
in the garbage? There is no need to construct an artificial conflict with an 
imaginary adversary: if HPS really want to compare multivariate quantiles, 
they can take any proposal from the long string of references given in their 
second paragraph, the string that (rightly!) does not contain KM. 

To dispel an impression that we are engaging here in some nihilistic word- 
play, let us consider an illustrative example. It will be a comparison; not of 
quantile concepts, but of possible ways of obtaining some interpretable in- 
sights from multivariate data. Suppose that we have a dataset concerning 
weight and height of several subjects. A possible quantity to look at is the 
Body Mass Index, BMI, as defined in Section 7 of HPS. If we use the trick 
we learned from [15], and consider the data on the logarithmic scale, then 
log(BMI) is log(weight) — 2 log(height), a linear combination of the new 
variables; its (univariate!) quantiles correspond to those of the projection 
into the "log(BMI) direction," the direction of the vector (1, —2). To visual- 
ize this, we may rotate the data so that the "log(BMI) direction" becomes 
vertical (Figure 1). The values of log(BMI) (up to a scale factor, which we 
do not worry about, because we are after univariate quantiles which are 
scale-equivariant) then correspond to the projections of the datapoints onto 
the ordinate; a horizontal line drawn through the chosen quantile (p = 0.75, 
say) then indicates which subjects have (log) BMI above the third quartile, 
which below, and how the data are divided in this respect — not a big deal 
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orthogonal complement to log(BMI) 

Fig. 1. The solid line is not a quantile, but merely indicates the location of the third 
sample quartile of the projection of the data to the ordinate. The broken line is the proposed 
multivariate quantile. Top left corner is Magritte. 



perhaps, but something real, something that, say, a lay person like a health 
professional may be able to understand. 

Now, given the same direction and same p = 0.75, HPS would suggest to fit 
the regression quantile instead (the broken line in Figure 1). Let us attempt 
an interpretation. We know that the line shows the (estimated) conditional 
(0.75)-quantile. This sounds promising — however: conditional on what? The 
projection of the data to the direction orthogonal to the log(BMI), showing 
a quantity proportional to 21og(weight) + log(height)? What is the meaning 
of the latter? Even if we take an affine transformation instead (relying on 
equivariance) , and transform the data so that the horizontal axis corresponds 
to some interpretable variable, for instance to log(weight): what is then the 
meaning of the quantiles of log(BMI) conditionally on log(weight)? Can we 
explain that — to a lay person, or to anybody? 

3. It's better to solve the right problem approximately. . . . than the 
wrong problem exactly. The ubiquitous quote attributed to Tukey by [1] 
is invoked here not because we would like to suggest that we solved the 
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"right" and somebody else the "wrong" problem, but because we believe 
that it expresses the principal tenet of practical data analysis: the emphasis 
is on having the job done, and much less on finding the "best" way of doing 
it. 

Our principal goal in KM — and here we probably erred again not stat- 
ing that more bluntly — was to show how depth contours can be "regressed 
on covariates." HPS comment that this issue is "briefly addressed" in our 
Section 11.3; well, other issues might take more space, but we believe that 
the solution we gave is complete, including a numerical realization and also 
theoretical analysis (in Section 9 of KM) . 

After seeing an early presentation of what later became [15], we wondered 
whether a similar construction cannot be accomplished using depth contours, 
as earlier suggested in the growth-chart context by [14]. It may be of interest 
here that our initial idea was to use the Laine's characterization of depth 
contours, as published in [8] — that is, along the lines how perhaps HPS would 
suggest to tackle the problem. Our later decision to abandon this approach 
may be interpreted as our ineptitude, or lack of good judgment — although 
we do not feel it exactly that way. Some technical aspects may be of interest. 

Abstracting from certain computational details, the problem is the fol- 
lowing: for any given direction tiGS 1 , consider the response values (y^,z^), 
arising from rotating the original bivariate response (yk,Zk) s ° that y%, the 
transformed yk, corresponds to the ordinate. What is needed then is a quan- 
tile regression of yt on z£, and also on the time covariate x^, the regression 
that satisfies the following: the dependence of y% on z£, for fixed Xf., should 
be linear (so that Laine's characterization applies), but at the same time 
the dependence of y^ on x^ (with fixed, say) should have a flexible, that 
is, nonparametric form. The latter is important, because after seeing several 
real biometric growth charts — see KM or [15] — it is clear that hardly any 
parametric model can be expected to describe those data well. 

The need to fit a parametric— nonparametric quantile regression makes the 
problem intricate, and not that straightforward within certain methodolo- 
gies (think kernel regression, for instance). Fitting a model where everything 
is linear would be easy (as can be seen from HPS); however, the nonpara- 
metric dependence on x^ is really vital for achieving a good description of 
the data here. The only strategy we were able to conceive was via a penal- 
ized approach adapted to our particular situation. Let us explain its main 
features; for simplicity, the superscript u will be suppressed in the notation. 

The standard formulation for bivariate nonparametric fitting — as dis- 
cussed, for instance, in [10] — seeks the fit / as a solution of the optimization 
task 

n 

(3.1) ^Pr{yk - f(zk,Xk)) + AJ(/) ^min!, 

k=i 1 
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where J can be, depending on what amount of smoothness we require from 
the fit, either the classical thin-plate penalty 



(3.2) 



J(f) 



dz 2 



+ 



or some of its L 1 variants, say, 

\d 2 f 



(3.3) 



J(f) 



dz 2 



+ 



d 2 f 
dz dx 



d 2 f 



+ 



dx 2 



dzdx 



dz dx 



+ 



d 2 f 



dx 2 



dzdx. 



It is well-known — see, for instance, [6] — that (3.1) is the Lagrangian version 
of the constrained regularization formulation 



(3.4) 



/.PriVk - f(zk,Xk)) min! subject to J(f) < A, 
k=i 



as both yield the same solutions for appropriately related values of A and 
A. The penalties (3.2) or (3.3), however, yield / that is nonparametric (that 
is, generally nonlinear) in both z and x. A possible way to make / linear in 
z for fixed x is to enforce d 2 f/dz 2 to be zero; the formulation (3.4) then 
becomes 



k=i 1 



subject to 



d 2 f 



dz 2 



< and J(f) < A, 



(3.5) 

where J is a penalty obtained, say, from (3.2) or (3.3) by dropping the 
d 2 f /dz 2 term from the integral. For instance, (3.3) leads to the penalty 



(3.6) 



J(f) 



d 2 f 



dz dx 



+ 



d 2 f 



dx 2 



dz dx: 



however, it is also possible to consider the penalty 

\d 2 f 



(3.7) 



J{f) 



dx 2 



dzdx 



instead. It is not clear whether the mixed-derivative term has to be kept 
or dropped — whether the dependence on z and x is additive or not. The 
Lagrange equivalent of (3.5) is a formulation with two tuning parameters, 



(3.8) 



fc=i 



d 2 f 



dz 2 



+ A 2 J(/) c Hmin! 



with Ai set very large, so that it effectively pushes the derivative in z to zero, 
and A2 chosen to obtain a fit flexible in x. For the quadratic penalty (3.2), 
the alternatives are analogous; the use of the squared second derivative with 
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respect to (3.5) or (3.8) amounts to the same effect, the onfy substantial 
difference is the form of J. 

Although we were able to implement a solution in this vein, and the nu- 
merical experiments gave somewhat promising results, there were serious 
issues. First, it is not clear whether to use (3.6) or (3.7) — both gave plau- 
sible, albeit somewhat different fits. Second, there is the notorious problem 
of selection of A2 — not that unsurmountable for one regression, but here we 
have to deal with a parametrized family of regressions, an interesting concep- 
tual problem, which we would, however, prefer to tackle separately at some 
other time. Third, and most important: we suddenly realized that we are 
chasing two rabbits simultaneously — even if we succeeded in overcoming the 
technological issues, any criticism of the methodology we would adopt for 
quantile regression would negatively affect also the proposed methodology 
of regressing on depth contours: in other words, if the results would not be 
convincing, it would not be clear whether it was problem of the directional 
characterization of depth contours, or the problem of the nonpar ametric 
quantile regression methodology. 

In such an unfortunate situation the solution came as the proverbial egg 
of Columbus: why not fit constants instead of lines?! That amounts to pro- 
jecting (zk,yk) into the direction of u, which yields a univariate response 
w k — which is then fitted by nonparametric quantile regression on x^. We 
sacrifice some nice mathematics (curiously, HPS find the property we use 
"somewhat surprising" — but it is in fact obvious; it is Laine's character- 
ization that is nontrivial!), but we gain a lot of autonomy: the required 
nonparametric quantile regressions now involve only one covariate, which 
means that there are quite a few methods out there proposed for this task. 
The users can choose one they prefer — for instance, if they want to avoid 
"quantile crossing phenomenon," presented as unavoidable in Section 6 of 
HPS, and thus obtain the contours that are nested, they can use recent 
methods dealing with this phenomenon, proposed, for instance, by [4] or [5]. 

So, by switching to univariate projections, we (i) got the job done; (ii) 
obtained approach that is independent of the finesses of the chosen quantile 
regression methodology; and (in) had our attention turned to the data- 
analytic insights obtained by univariate projections, the philosophy that 
enabled us to avoid pursuit of nebulous concepts like multivariate quantiles. 

If we also took some satisfaction in making one step ahead (or sidewise) of 
Laine, we would like to stress that this in no case meant the universal a priori 
dismissal of that (or any other) idea as unsuitable to lead to the solution of 
the same problem — and perhaps better and even easier. Provided, of course, 
it is the same problem — we maintain that the nonparametric dependence 
on x is really essential. HPS hint at the possible application of local linear 
methods here; we will await their, or anybody else's relevant implementation 
with great curiosity. As put by Anonymous (quoted with permission): "The 
early bird gets the worm. The second mouse gets the cheese." 
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4. One country, two systems. Despite all the subtle differences aired 
here, we agree that HPS and KM have a lot in common: they both ad- 
dress depth contours by adopting a sort of directional approach. Regression 
aspects having been discussed to some extent in the previous section, we 
concentrate now on the possible impact on the computation of the depth 
contours (in the "static" situation, without covariates). 

Here HPS should be credited for striking upon an important aspect: exact 
algorithms for higher dimensions — which do exist (despite the impression of 
HPS); see [2, 3] and the references there — indeed involve elements of linear 
programming. The topic is considerably challenging: the classical result of 
[7] shows that the problem is NP-hard when also the dimension is the part 
of the input, and gives lower bounds on best possible complexities of exact 
algorithms. The theoretical complexities derived by [2] depend themselves 
on the complexity of linear programming — a well-known chapter for itself. 

However, rather than speculating on potential higher-dimensional imple- 
mentations, let us comment on our personal experience with good old dimen- 
sion two. Although obtaining a working routine for depth contours seems to 
be straightforward there, given the existing literature and (much less abun- 
dant than the latter, but still existing) implementations, the truth is not 
that simple. The implementation of some of the proposed algorithms, al- 
though in principle possible, would be so laborious that from the practical 
point of view, these algorithms may be considered unimplementable. Other 
algorithms may work reasonably well for moderate sample sizes, of order of 
hundreds or thousands, but are prohibitively slow for datasets one hundred 
times bigger, the datasets that typically occur in the applications mentioned 
in Section 3. 

In such a case, instead of pursuing the mathematical vision of ideal algo- 
rithm, it may be more useful to think carefully what in fact is wanted. We 
believe that HPS, unlike "certain mathematicians without personal knowl- 
edge of the subject" (to borrow an idiom from R. A. Fisher's repertory), 
know well that exact computation is an illusion. One can start from linear 
programming: for handling large datasets, iterative interior point methods 
are used — compare [9]. Even when sticking to the simplex method, hoping 
that the reality will exhibit its average probabilistic rather than its expo- 
nential worst-case complexity, we have to keep in mind that many of the 
standard, "exact" linear algebra operations are iterative, that is, approx- 
imate. And even if agreeing to use only "pure" Gaussian elimination, we 
still stumble upon floating-point computer arithmetic; exact algorithms in 
computational geometry require the use of exact arithmetic, an algorithmic 
toolbox typically not under the command of statisticians (and often neither 
of computer scientists). 
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If we want to draw a curve, we draw one hundred or one thousand 
small segments; if we want to solve an optimization, we employ an itera- 
tive method. For computing depth contours in KM, we do not propose to 
sample the unit sphere S fc_1 (that is, we do not use any random mecha- 
nism, as the language of HPS might imply), but select K (typically several 
thousand) equispaced (more sophisticated strategies are possible) directions 
in S 1 (with no ambition for general k). Computing univariate quantiles is 
simple and quick — the standard R [13] function quant ile uses the straight- 
forward 0{n\ogn) algorithm, the more sophisticated kuantile of [9] an 
algorithm with the best possible, O(n) complexity; the ambiguities arising 
with possible nonunique results are solved in a straightforward fashion. (The 
ambiguities in linear programming solutions can be dealt with as well, but 
require an implementation that is returning all extremal points of the solu- 




FlG. 2. Left panels show the quantiles of projections and right the directional regression 
quantiles, each for 201 equispaced directions. The lower row shows the results for randomly 
perturbed data. 
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tion set, not a usual feature; also, unlike with quantiles of projections, where 
we know that we have to use the inf version, it is not clear which version 
of linear-programming solution yields the right thing. Centroid? Steiner's 
point?) The most expensive operation here, 0(n), is the vector multiplica- 
tion obtaining the projection itself; the contour constructing algorithm has 
complexity linear in K, as the directions come already sorted. The summary 
complexity is 0(nK), and the program runs pretty fast, easily accommodat- 
ing data in the size of millions. The resulting contour is indeed approximate, 
but KM prove theoretically (Theorems 8 and 9) that the quality of approxi- 
mation increases with K; it may decrease with k, even "extremely" as HPS 
claim (despite showing any evidence for that) — but as already mentioned, 
general k was not our ambition. 

Let us conclude this section with an example showing pitfalls of certain 
templates of thinking. Consider the dataset with points that all lie on a 
line (this is perhaps somehow contrived, but is the situation with perfectly 
linearly dependent data that impossible?). For simplicity, we will take them 
uniformly spaced on some segment. The depth contour for p = 0.1, say, is 
then exactly the part of the segment that contains 80% of the datapoints, 
with 10% on each side chopped off. If the orientation of the segment is 
random, the method proposed by KM is indeed unlikely to pick up the 
exact facet; but the result, with only 201 directions used, is pretty close 
(as perhaps can be seen in the upper left panel of Figure 2). On the other 
hand, directional quantile regressions all coincide with the line through the 
segment; to chop off the tails, and thus obtain the correct solution, the 
segment needs to be oriented vertically; all potential quantile regression then 
intersect in the right point, but the standard quantile regression stumbles 
upon the fact that the design matrix is singular, making typical quantile 
regression programs exit with an error message; would be interesting to 
watch how the parametric linear programming code of HPS handles that. 

Nevertheless, not having the latter at hand, we palliate (or cure?) the 
problem by adding some random noise, as shown in the lower panels of 
Figure 2. Then both approaches return the same answer (as can be perhaps 
seen on the lower right panel of Figure 2). 

5. Conclusion. We are grateful to Editors of The Annals of Statistics 
and especially to Xuming He for inviting us to write this note, and for 
valuable editorial suggestions; we also greatly benefited from discussions 
with Roger Koenker. The second author would like to return warmly the 
thanks addressed to his person in the Acknowledgement of the discussed 
paper: whatever he did, he did with pleasure. 
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